Graph-Structured Representations for Visual Question Answering

机译：用于视觉问题回答的图形结构化表示

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes to improve visual question answering (VQA) withstructured representations of both scene contents and questions. A keychallenge in VQA is to require joint reasoning over the visual and textdomains. The predominant CNN/LSTM-based approach to VQA is limited bymonolithic vector representations that largely ignore structure in the sceneand in the form of the question. CNN feature vectors cannot effectively capturesituations as simple as multiple object instances, and LSTMs process questionsas series of words, which does not reflect the true complexity of languagestructure. We instead propose to build graphs over the scene objects and overthe question words, and we describe a deep neural network that exploits thestructure in these representations. This shows significant benefit over thesequential processing of LSTMs. The overall efficacy of our approach isdemonstrated by significant improvements over the state-of-the-art, from 71.2%to 74.4% in accuracy on the "abstract scenes" multiple-choice benchmark, andfrom 34.7% to 39.1% in accuracy over pairs of "balanced" scenes, i.e. imageswith fine-grained differences and opposite yes/no answers to a same question.

机译：本文提出了利用场景内容和问题的结构化表示来改进视觉问题回答（VQA）的方法。 VQA的一个关键挑战是需要在视觉和文本域上进行联合推理。基于CNN / LSTM的VQA的主要方法受到整体矢量表示形式的限制，该表示形式在很大程度上忽略了场景中的结构和问题形式。 CNN特征向量无法像多个对象实例一样有效地捕获场景，而LSTM将问题作为一系列单词来处理，这并不能反映语言结构的真正复杂性。相反，我们建议在场景对象和疑问词上构建图形，并描述一个利用这些表示中的结构的深度神经网络。这显示出比LSTM的后续处理具有明显优势。我们的方法的整体有效性体现在对最新技术的显着改进上，在“抽象场景”多项选择基准上，准确度从71.2％提高到74.4％，而在双选择标准上，准确度从34.7％提高到39.1％ “平衡”场景，即具有细微差异的图像，并且对相同问题的回答是/否。

著录项

作者
Teney, Damien; Liu, Lingqiao; Hengel, Anton van den;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Multiple answers to a question: a new approach for visual question answering [J] . Hosseinabad Sayedshayan Hashemi, Safayani Mehran, Mirzaei Abdolreza The Visual Computer . 2021,第1期

机译：问题的多个答案：一种新的视觉问题接听方法
2. Question-aware prediction with candidate answer recommendation for visual question answering [J] . B. Kim, J. Kim Electronics Letters . 2017,第18期

机译：带有候选答案推荐的问题感知预测，用于视觉问答
3. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
4. Graph-Structured Representations for Visual Question Answering [C] . Damien Teney, Lingqiao Liu, Anton van den Hengel IEEE Conference on Computer Vision and Pattern Recognition . 2017

机译：可视化问答的图形结构表示
5. An Analysis of Bottom-Up Attention Models and Multimodal Representation Learning for Visual Question Answering [D] . Narayanan, Venkatraman . 2019

机译：视觉问题应答的自下而上关注模型和多式联表学习分析
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Graph-Structured Representations for Visual Question Answering [O] . Teney, Damien, Liu, Lingqiao, Hengel, Anton van den 2017

机译：用于视觉问题回答的图形结构化表示
8. Questions and Answers on Quality, the ISO 9000 Standard Series, Quality SystemRegistration, and Related Issues. More Questions and Answers on the ISO 9000 Standard Series and Related Issues [R] . Breitenberg, M. 1993

机译：有关质量的问题和解答，IsO 9000标准系列，质量体系注册和相关问题。有关IsO 9000标准系列及相关问题的更多问题和解答

Graph-Structured Representations for Visual Question Answering

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅